Skip to content

Fix walrus truthiness on metrics bounds and identity-partition projection#3412

Merged
Fokko merged 4 commits into
apache:mainfrom
kevinjqliu:kevinjqliu/walrus-metrics-bounds
May 25, 2026
Merged

Fix walrus truthiness on metrics bounds and identity-partition projection#3412
Fokko merged 4 commits into
apache:mainfrom
kevinjqliu:kevinjqliu/walrus-metrics-bounds

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu commented May 24, 2026

Inspired by the walrus issue in #3353

Summary

Several if x := dict.get(k): checks treated legitimate falsy values as missing:

  • lower_bounds.get(field_id) / upper_bounds.get(field_id) return bytes. b"" is a valid serialization of an empty string and was being skipped, causing metrics-based row filtering to return ROWS_MIGHT_MATCH when it should have been ROWS_CANNOT_MATCH (and vice versa for strict eval).
  • accessors[...].get(file.partition) can return 0 or "" for an IdentityTransform partition. The walrus dropped it, so projected partition columns were filled with null instead of the actual value.
  • inspect.py lower_bound / upper_bound rendering had the same b"" issue, showing None instead of "" in readable_metrics.

All conditions are switched to explicit is not None checks. A small _readable_bound helper deduplicates the inspect rendering.

Changes

  • pyiceberg/expressions/visitors.py_InclusiveMetricsEvaluator and _StrictMetricsEvaluator bound lookups (21 sites).
  • pyiceberg/io/pyarrow.py_get_column_projection_values and ArrowProjectionVisitor missing-field handling.
  • pyiceberg/table/inspect.py — extract _readable_bound, use it in both the entries and _get_files_from_manifest rendering paths.

Tests

  • tests/expressions/test_evaluator.py — inclusive and strict evaluators with empty-string bounds, covering lower-only, upper-only, and both-bounds branches.
  • tests/io/test_pyarrow.py — parametrized identity-transform projection with falsy partition values (0, "", None).
  • tests/table/test_inspect.py_readable_bound helper plus integration tests via tbl.inspect.entries() and tbl.inspect.files() for empty-string and null bounds.

@kevinjqliu kevinjqliu requested a review from Fokko May 24, 2026 21:58
Copy link
Copy Markdown
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, this is a great one! Luckily this does not lead to correctness issues, but to a performance hit, since it might open files that don't contain relevant data.

@Fokko Fokko merged commit 720708a into apache:main May 25, 2026
16 checks passed
@Fokko
Copy link
Copy Markdown
Contributor

Fokko commented May 25, 2026

Thanks @kevinjqliu for fixing this, and thanks @ndrluis for the review 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants